Bayesian Variable Selection in Clustering High-Dimensional Data With Substructure
نویسندگان
چکیده
In this article we focus on clustering techniques recently proposed for highdimensional data that incorporate variable selection and extend them to the modeling of data with a known substructure, such as the structure imposed by an experimental design. Our method essentially approximates the within-group covariance by facilitating clustering without disrupting the groups defined by the experimenter. The method we adopt simultaneously determines which expression patterns are important, and which genes contribute to such patterns. We evaluate performance on simulated data and on microarray data from a colon carcinogenesis study. Selected genes are biologically consistent with current research and provide strong biological validation of the cluster configuration identified by the method.
منابع مشابه
High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust
The R package bclust is useful for clustering high-dimensional continuous data. The package uses a parametric spike-and-slab Bayesian model to downweight the effect of noise variables and to quantify the importance of each variable in agglomerative clustering. We take advantage of the existence of closed-form marginal distributions to estimate the model hyper-parameters using empirical Bayes, t...
متن کاملSparse Bayesian hierarchical modeling of high-dimensional clustering problems
Clustering is one of the most widely used procedures in the analysis of microarray data, for example with the goal of discovering cancer subtypes based on observed heterogeneity of genetic marks between different tissues. It is wellknown that in such high-dimensional settings, the existence of many noise variables can overwhelm the few signals embedded in the high-dimensional space. We propose ...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملBayesian Variable Selection in Clustering High-Dimensional Data
Over the last decade, technological advances have generated an explosion of data with substantially smaller sample size relative to the number of covariates (p n). A common goal in the analysis of such data involves uncovering the group structure of the observations and identifying the discriminating variables. In this article we propose a methodology for addressing these problems simultaneousl...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008